Skip to content

Refactor of the library core#98

Merged
Mirrowel merged 37 commits intodevfrom
refactor/core
Jan 24, 2026
Merged

Refactor of the library core#98
Mirrowel merged 37 commits intodevfrom
refactor/core

Conversation

@Mirrowel
Copy link
Owner

A big refactor of Client.py and usage_manger.py into a modular and extensible system that was VERY needed, allowing much easier addition of new features and providers, windows, hooks.

This commit executes a major architectural refactor, decomposing the monolithic `client.py` and `usage_manager.py` files into modular, domain-specific packages to improve maintainability and separation of concerns.

- **Client Refactor**: `RotatingClient` is now a lightweight facade delegating to:
  - `RequestExecutor`: Unified retry and rotation logic.
  - `StreamingHandler`: Stream processing and error detection.
  - `CredentialFilter`: Tier and priority compatibility filtering.
  - `ModelResolver`: Model name resolution and whitelisting.
  - `ProviderTransforms`: Provider-specific request mutations.

- **Usage Manager Refactor**: `UsageManager` logic is now distributed across:
  - `TrackingEngine`: Usage recording and window management.
  - `LimitEngine`: Enforcement of cooldowns, caps, and fair cycle limits.
  - `SelectionEngine`: Credential selection strategies (balanced/sequential).
  - `CredentialRegistry`: Stable identity management for credentials.
  - `UsageStorage`: Resilient async JSON persistence.

- **Core Infrastructure**: Added `src/rotator_library/core` for shared types, error definitions, and centralized configuration loading.

BREAKING CHANGE: The files `src/rotator_library/client.py` and `src/rotator_library/usage_manager.py` have been deleted. `client.py` is replaced by the `client` package. `usage_manager.py` is replaced by the `usage` package. Direct imports from `rotator_library.usage_manager` must be updated to `rotator_library.usage` or top-level exports.
…mplementation

- Preserve the original monolithic implementation in `_client_legacy.py` and `_usage_manager_legacy.py`.
- Update `RequestExecutor` to support transaction logging, request sanitization, and consecutive quota failure detection.
- Implement `ConcurrentLimitChecker` in the usage limit engine to enforce `max_concurrent` constraints.
- Improve `StreamingHandler` with robust error buffering for fragmented JSON responses.
- Add fair cycle reset logic and quota baseline synchronization to the new `UsageManager`.
This overhaul introduces smart queuing for credential acquisition and shared quota tracking.

- Implement async waiting in `UsageManager`: calls to `acquire_credential` now block (up to a deadline) using asyncio primitives when keys are busy or on cooldown.
- Add Quota Group synchronization: request counts are now synced across models that share a specific quota pool (e.g., Antigravity variants).
- Add support for cached prompt token tracking in usage statistics.
- Refactor `RequestExecutor` to reuse a shared `httpx.AsyncClient` for better performance.
- Correct token counting for Antigravity models by including preprompt overhead.

BREAKING CHANGE: `UsageManager.acquire_credential` is now `async` and must be awaited. `RequestExecutor` now requires an `http_client` argument during initialization.
- **error_handler**: Implement logic to extract `quotaValue` and `quotaId` from Google/Gemini error responses for better rate limit observability.
- **streaming**: Remove legacy `UsageManager` support and the `__init__` method from `StreamingHandler`; usage recording is now delegated to `CredentialContext`.
- **client**: Handle `_parent_log_dir` internal parameter to configure transaction logger output directory.
This commit introduces a comprehensive overhaul of the usage tracking system to support more complex quota management scenarios and integration capabilities.

- **Granular Usage Scopes**: Usage windows can now be scoped to specific models, quota groups, or credentials via `WindowDefinition.applies_to`, enabling precise limit enforcement (e.g., shared model quotas vs. individual key limits).
- **Cost Calculation**: Integrated `litellm` cost calculation for both standard and streaming requests. `approx_cost` is now tracked and persisted in usage statistics.
- **Provider Hooks**: Added `HookDispatcher` and `on_request_complete` interface, allowing provider plugins to intercept request results, override usage counts, or trigger exhaustion based on custom logic.
- **Usage API**: Introduced `UsageAPI` facade (`src/rotator_library/usage/integration/api.py`) to provide a stable interface for external components to query state and manage cooldowns.
- **Fair Cycle Enhancements**: Refined fair cycle tracking with support for global state persistence and configurable tracking modes (credential-level vs. group-level).
- **Configuration**: Expanded environment variable support for custom caps, fair cycle settings, and sequential fallback multipliers.
- **Persistence**: Updated storage schema to handle nested `model_usage` and `group_usage` statistics.

Also in this commit:
- feat(client): add automatic request validation via provider plugins
- fix(streaming): correctly track cached vs. uncached tokens in usage stats
- refactor: add backward compatibility shim in `src/rotator_library/usage_manager.py
Refines the request executor and client to support deep observability and dynamic quota management within the modular architecture.

- **Observability**:
  - Implement a sanitized LiteLLM logging callback to safely pipe provider logs to the library logger.
  - Capture response headers in `UsageManager`, `CredentialContext`, and failure logs to aid debugging.
  - Pass request headers to the failure logger for better context.
- **Usage Management**:
  - Implement `_apply_usage_reset_config` to dynamically generate rolling window definitions (e.g., daily limits) based on provider settings.
  - Fix `fair_cycle_key` resolution logic in the tracking engine.
- **Client & Executor**:
  - Support passing provider-specific LiteLLM parameters via `RequestExecutor`.
  - Update `BackgroundRefresher` to support the new usage manager registry and prevent crashes when managers are missing.
…ization

This commit overhauls how usage statistics are aggregated and initialized to support shared quotas and improve stability.

- **Quota Groups**: Implemented `_backfill_group_usage` to derive shared group statistics (e.g., tiered limits) from individual model windows. Updated `CooldownChecker` to enforce limits at both model and group levels.
- **Initialization**: Added `initialize_usage_managers` with async locking to `RotatingClient`. Updated `main.py` and `background_refresher.py` to invoke this explicitly, ensuring state is loaded before traffic.
- **Persistence**: Switched storage mechanisms to `ResilientStateWriter` and `safe_read_json` to prevent data corruption during atomic writes.
- **Providers**: Refined quota calculations (calculating `quota_used` from fractions) for Chutes, Firmware, and NanoGPT.
- **Antigravity**: Updated system prompts to strictly enforce parallel tool calling behavior.
- **Logging**: Implemented batched logging for quota exhaustion events to reduce noise.
…t execution

This commit introduces comprehensive usage tracking and refactors the client execution flow for better observability and stability.

- Refactor `RotatingClient.acompletion` to be explicitly `async`, ensuring proper execution in `proxy_app`.
- Implement detailed usage summaries in `get_usage_stats`, including token caching percentages, approximate costs, and detailed provider states.
- Add granular logging in `RequestExecutor` to trace credential availability (displaying blocks by cooldowns, fair cycle, or caps) and current quota window saturation.
- Introduce debounced state saving in `UsageManager` to optimize storage I/O and add logic for backfilling model usage data.

BREAKING CHANGE: `RotatingClient.acompletion` is now an `async` function and must be awaited by the caller.
…window manager

- Update `RequestExecutor` to await `usage_manager.get_availability_stats`, ensuring non-blocking execution during availability checks.
- Expose `window_manager` directly as a property on `UsageManager`.
- Refactor `RequestExecutor` to access `window_manager` directly instead of traversing via `limits.windows`.
Expanded the usage tracking system to capture and persist detailed token metrics, specifically reasoning (thinking) tokens and cache creation tokens.

- Updated client executors to extract `reasoning_tokens` and `cache_creation_tokens` from provider responses.
- Extended `UsageStats` and `WindowStats` models to store granular token breakdowns and explicit success/failure counts.
- Adapted storage and aggregation logic to persist and calculate these new metrics across quota windows.
…tions

- Archive the existing `RotatingClient` and `UsageManager` logic into new `_legacy` modules (`_client_legacy.py` and `_usage_manager_legacy.py`). This preserves the stable baseline implementation to facilitate reference and comparison during the ongoing core architecture refactor.
- Add `src/rotator_library/providers/example_provider.py` as a comprehensive reference template. This file documents the standard patterns for implementing providers with advanced usage management, quota groups, custom token counting, and background refresh jobs.
…ntation

This introduces a thread-safe mechanism using `contextvars` to accurately track and report internal API retries within providers.

- Implement `ContextVar` retry counting in `AntigravityProvider` to capture hidden API costs from internal retries (e.g., on empty responses or malformed calls).
- Update `ExampleProvider` with comprehensive patterns for custom usage handling, including retry counting and error-based cooldowns.
- Expand documentation for `UsageAPI` and `HookDispatcher` with detailed usage examples and architectural context.
Updates the usage manager to prevent lower (stale) API usage values from overwriting higher (current) local counters during background synchronizations. API providers often return cached data or update in increments, causing local state regression.

- Implement `max(local, api)` logic for request counts to ensure monotonic growth
- Add `force` parameter to `UsageManager` and quota trackers to allow manual overrides
- Preserve accurate local tracking while allowing forced resets
This introduces a compatibility layer that allows the `RotatingClient` to accept Anthropic-format requests, translating them to the internal OpenAI format for processing and converting responses back.

- Implement `AnthropicHandler` with support for `messages` and `count_tokens`.
- Integrate handler into `RotatingClient` to enable direct Anthropic SDK usage.

Also in this commit:
- feat(usage): add `force_refresh_quota` and `reload_usage_from_disk` for manual state management
- refactor(usage): implement `reload_from_disk` to sync local state without external calls
@Mirrowel Mirrowel self-assigned this Jan 22, 2026
@Mirrowel Mirrowel added enhancement New feature or request Agent Monitored Monitored for AI Agent to review PR's and commits Priority labels Jan 22, 2026
@mirrobot-agent
Copy link
Contributor

I'm starting a comprehensive review of this major core refactor. With 62 files changed and a significant shift towards modularity in Client.py and usage_manager.py, I'll be taking a deep dive into the new architecture. Expect my full feedback shortly.

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This is a phenomenal refactor of the library core. Moving from the monolithic Client.py and usage_manager.py to a modular, engine-based architecture is a massive leap forward for the project's maintainability and extensibility. The new design is clean, robustly handles complex logic (like fair cycles and shared quota pools), and provides a much more intuitive API through the UsageManager and RotatingClient facades.

The code quality is exceptionally high, with great attention to detail in concurrency control, persistence resilience, and schema migrations.

Architectural Feedback

  • Modularity: The separation of concerns into SelectionEngine, LimitEngine, TrackingEngine, and WindowManager is excellent. It makes the logic much easier to reason about and test.
  • Credential Registry: The introduction of stable IDs based on OAuth metadata or API key hashes is a clever solution to the problem of tracking usage across path changes.
  • Context Manager Pattern: Using CredentialContext for credential acquisition and release ensures that resources are handled correctly even in the face of exceptions.

Key Suggestions

  • Shared Setup in Executor: _execute_non_streaming and _execute_streaming share a lot of initialization and filtering logic. Consolidating this into a helper would reduce duplication.
  • Logging Noise: Consider lowering the log level for credential releases in UsageManager to avoid flooding logs in high-traffic scenarios.
  • Engine Redundancy: The empty release method in TrackingEngine could be removed if it's not intended to hold logic, avoiding confusion about where concurrency state is managed.

Nitpicks and Minor Points

  • I noticed the StreamingHandler is now very cleanly integrated with the new context manager pattern.
  • The migration logic in UsageStorage for v1 -> v2 schema looks solid and handles the legacy path-based keys gracefully.

Questions for the Author

None. The documentation and implementation are very clear.

This review was generated by an AI assistant.

Comment on lines 1716 to 1727
lib_logger.info(
f"Released credential {self._mask_accessor(state.accessor)} "
f"from {model} (remaining concurrent: {remaining}"
f"{f'/{state.max_concurrent}' if state.max_concurrent else ''})"
)
else:
state.active_requests = max(0, state.active_requests - 1)
lib_logger.info(
f"Released credential {self._mask_accessor(state.accessor)} "
f"from {model} (remaining concurrent: {state.active_requests}"
f"{f'/{state.max_concurrent}' if state.max_concurrent else ''})"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In high-concurrency environments, logging every credential release might generate excessive noise in the logs. Consider changing this to DEBUG level or adding a conditional check if detailed concurrency tracking is needed.

Repository owner deleted a comment from ellipsis-dev bot Jan 22, 2026
…counts

- Extract `_aggregate_model_windows` to unify how usage stats are summed across quota groups and credentials, reducing code duplication.
- Implement `_reconcile_window_counts` to ensure success and failure counters remain consistent (mathematically valid) when total request counts are updated from external quota sources.
- Enable synchronization and backfilling of credential-level windows to reflect aggregated model usage.
- Simplify `update_usage` and `_backfill_group_usage` logic by leveraging the new shared aggregation helpers.
This commit extracts the initialization, credential filtering, and validation steps into a reusable `_prepare_execution` helper method within `RequestExecutor`. This ensures consistency and reduces code duplication between streaming and non-streaming request handlers.

Also in this commit:
- refactor(usage): remove unused `release` method from `TrackingEngine
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

This update makes great strides in refining the core refactor. The architectural improvements in request execution and the more robust usage aggregation logic are excellent additions.

Assessment of New Changes:

  • Shared Setup: The consolidation in RequestExecutor is well-implemented and addresses the previous feedback perfectly.
  • Engine Cleanup: The removal of the redundant release method in TrackingEngine simplifies the tracking flow.
  • Improved Aggregation: The new _aggregate_model_windows and _reconcile_window_counts logic in UsageManager provides a much more resilient way to handle shared quota groups and out-of-sync external metrics. It's a significant improvement over the previous model-syncing approach.

Overall Status:
The PR is in excellent shape. The only remaining item is the logging level adjustment for credential releases, which I've flagged again to ensure it doesn't get missed. Once that's addressed, I'm fully confident in this refactor.

This review was generated by an AI assistant.

async with lock:
state.active_requests = max(0, state.active_requests - 1)
remaining = state.active_requests
lib_logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested in the previous review, this log level should be changed to DEBUG. In high-concurrency scenarios, logging every credential release at INFO level creates significant noise.

Refactors the usage tracking system to remove the monolithic `UsageStats` structure in favor of distinct `ModelStats`, `GroupStats`, and `TotalStats` containers. This eliminates complex window aggregation and backfilling logic by recording usage directly to relevant scopes via a unified `UsageUpdate` mechanism.

- Replace `UsageStats` with `TotalStats`, `ModelStats`, and `GroupStats` in `CredentialState`
- Remove `_backfill_group_usage`, `_sync_quota_group_counts`, and window aggregation logic
- Centralize request recording in `TrackingEngine` using a new `UsageUpdate` dataclass
- Update storage persistence to serialize/deserialize the new schema
- Adapt limit checkers (`WindowLimitChecker`, `CustomCapChecker`) to access specific stats scopes
- Update client executor and API integration to reflect the new data model
- Add `window_limits_enabled` configuration (default `False`) to allow disabling local window quota blocking.
- Update `LimitEngine` to only include `WindowLimitChecker` when explicitly enabled in config.
- Restore legacy-style logging for quota exhaustion events, including reset time calculations.
- Re-implement fair cycle exhaustion logging to track credential usage status.
- Modify window exhaustion logic to only apply cooldowns when forced.
- Update `UsageManager` logic to apply quota updates to model windows only when no `group_key` is provided.
- Ensure usage is attributed to the correct scope (group vs. model), preventing model window updates when API-level group quotas are active where specific model attribution may be unknown or irrelevant.
…imestamps

- Standardize default configuration by replacing the specific `daily` window with a generic 24h `rolling` window.
- Remove the deprecated `total` window definition and add logic to ignore legacy "total" windows during storage parsing.
- Add `*_human` date string fields (e.g., `reset_at_human`, `last_updated_human`) alongside Unix timestamps in storage dumps to improve debugging and observability.
The quota display logic in `RequestExecutor` now implements a fallback strategy. It prioritizes shared group limits but falls back to model-specific limits if no group window is found or if the group window lacks a limit.

Changes to defaults:
- Remove `WindowDefinition.total` and the default infinite tracking window.
- Simplify default window configuration to a single primary daily window (removed 5h window).

Also in this commit:
- chore: update `.gitignore` to anchor paths and exclude `oauth_creds/` and `usage/` directories
The tracking engine now checks `self._config.window_limits_enabled` before evaluating window exhaustion for groups and models. This ensures that usage tracking does not mark resources as exhausted based on window limits when the feature is explicitly disabled.
Inject a shared `provider_instances` cache into core client sub-components to prevent duplicate plugin instantiation.

- Update `CredentialFilter`, `ModelResolver`, `ProviderTransforms`, and `RequestExecutor` to accept an optional `provider_instances` dict.
- Configure `RotatingClient` to pass its central instance registry to these components during initialization.
- Ensure plugin instances are reused across the client lifecycle, improving state consistency and reducing overhead.
This change introduces a `SingletonABCMeta` metaclass to ensure that all `ProviderInterface` implementations behave as singletons. This resolves state fragmentation issues where different components (UsageManager, Hooks, etc.) held separate instances with distinct caches.

- Implement `SingletonABCMeta` and apply to `ProviderInterface`.
- Remove redundant local instance caching in `HookDispatcher` as class instantiation now guarantees a singleton return.
- Add debug logging in `CredentialFilter` to trace instance creation vs. cache hits.
This commit overhauls how quota exhaustion is determined by delegating the decision to individual providers rather than relying on a generic check in the usage manager.

- Introduce `is_initial_fetch` flag to distinguish between startup synchronization and background refreshes.
- Update `AntigravityQuotaTracker` to only apply API-based exhaustion during initial fetch (due to coarse 20% API update increments), relying on local tracking for subsequent updates.
- Update `BaseQuotaTracker` to strictly validate `reset_timestamp`, ignoring it if the quota is full (1.0 remaining), and applying exhaustion immediately if remaining is 0.
- Refactor `UsageManager` to accept an explicit `apply_exhaustion` boolean from providers.
- Add idempotency check to `TrackingEngine` to prevent duplicate logging of exhaustion events.
…ion parsing

- Introduce `quota_threshold` in Fair Cycle configuration to control exhaustion based on window limit multipliers.
- Implement robust duration string parsing (e.g., "1d2h30m") and flexible cooldown modes (offsets, percentages) for Custom Caps.
- Update `FairCycleChecker` to utilize `WindowManager` for dynamic quota resolution.
- Allow Custom Caps to define limits independent of window clamps and improve cooldown calculation logic.
- Remove redundant explicit exhaustion marking in `TrackingEngine` in favor of centralized checker logic.
The custom cap logic has been overhauled to independently verify both model-level and group-level limits. Previously, the system resolved a single "most specific" cap, which could allow requests to bypass group limits if a model-specific cap was present (or vice versa).

- Implemented `_find_all_caps` to retrieve both priority/default model caps and group caps simultaneously.
- Updated `check` method to validate every applicable cap against its respective usage scope (model stats vs. group stats).
- Extracted validation logic into `_check_single_cap` for better readability and reuse.
- Added `get_all_caps_for` to expose all active limits for a given context.
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

This update continues the excellent work on the core refactor. The modularization of the tracking logic into record_usage and the transition to specific ModelStats/GroupStats structures significantly improves the clarity and maintainability of the usage system.

Assessment of New Changes:

  • Consolidated Tracking: The refactor of the TrackingEngine and UsageManager to use record_usage is a great improvement. It eliminates a lot of redundant aggregation logic and makes the data flow much clearer.
  • Provider Singletons: Implementing SingletonABCMeta for providers is a smart move to ensure consistency across different components.
  • Improved Cooldowns: The enhanced custom cap cooldown logic, especially the support for offsets from natural reset, provides much better control over rotation behavior.

Key Issues to Address:

  • Redundant Logic: There is a duplicate code block in _execute_non_streaming in executor.py that should be removed.
  • Duplication Cleanup: A few minor duplicates (constant definition, __all__ entry) and missing docstring parameters were noted.
  • Logging Level: The log level for credential releases in UsageManager still needs to be adjusted to DEBUG as requested previously.

Overall Status:
The architecture is now very robust. Once these minor cleanups are addressed, this refactor is in great shape for merge.

This review was generated by an AI assistant.

async with lock:
state.active_requests = max(0, state.active_requests - 1)
remaining = state.active_requests
lib_logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As raised in previous reviews, this log level should be changed to DEBUG to avoid log spam in high-traffic environments.

This introduces a "high-water mark" tracking mechanism for usage windows. Even after a window resets (e.g., a daily limit reset), the system now retains the maximum number of requests ever recorded for that specific window definition.

- Add `max_recorded_requests` and `max_recorded_at` fields to `WindowStats`.
- Update `TrackingEngine` to capture and update peak usage in real-time.
- Ensure `WindowManager` carries forward historical max values when rotating or recreating expired windows.
- Update storage persistence and manager output to include these new metrics.
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

The addition of historical high-water mark tracking (max_recorded_requests) is a great feature. It integrates well with the persistence layer and properly carries forward across window resets.

Assessment of New Changes:

  • Historical Tracking: The implementation in WindowManager to preserve the max from expired windows is solid.
  • Persistence: The storage schema updates correctly handle the new fields.

Status of Previous Feedback:

  • Unaddressed Duplications: There are still several blatant duplications in executor.py, usage/__init__.py, and defaults.py that were flagged in the previous turn.
  • Logging Level: The UsageManager release log level remains at INFO. Please shift this to DEBUG to prevent log saturation in high-concurrency environments.

Overall Status:
The new functionality is excellent, but please take a moment to address the remaining cleanup items and duplicate logic. These small refinements will significantly improve the production readiness of this core refactor.

This review was generated by an AI assistant.

async with lock:
state.active_requests = max(0, state.active_requests - 1)
remaining = state.active_requests
lib_logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested in several previous reviews, this log level should be changed to DEBUG to avoid log spam in production environments.

)
else:
state.active_requests = max(0, state.active_requests - 1)
lib_logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log level should also be changed to DEBUG.

…and sorting logic

- Implement support for displaying multiple quota windows (e.g., daily, monthly) per group in dashboard and detail views.
- Refactor `UsageManager` to aggregate quota stats by window structure instead of flattening totals.
- Fix token usage calculations to correctly distinguish between cached and uncached input tokens.
- Add intelligent sorting strategies: providers ordered by attention needed (quota status), and quota groups by lowest remaining percentage.
- Add configuration toggle to show/hide model-level usage details per provider.
- Improve UI layout with adjustable column widths, better cooldown grouping, and natural sort order for credentials.
- Defer initialization of window timers (`started_at`, `reset_at`) until the first request is actually recorded.
- Prevent the display of misleading reset times for windows that have not yet been utilized.
- Update quota synchronization logic to only apply remote reset times and timestamps when usage is detected.
- Change sorting criteria in `QuotaViewer` and `UsageManager` to order by total quota limit (lowest first) rather than remaining percentage.
- Implement alphabetical tie-breaking for groups with equal limits.
- Ensure groups without limits are sorted last.
- Provide a more stable view order that reflects capacity constraints rather than transient usage.
This update enhances the usage statistics tracking and visualization to support advanced proxy features.

- **Usage Logic**: Update `UsageManager` to calculate and expose Fair Cycle exhaustion, group-specific cooldowns, and Custom Cap limits.
- **Status Calculation**: Refine credential status logic to identify a "mixed" state when only specific quota groups are exhausted.
- **UI Indicators**: Add visual markers for Fair Cycle (`:scales:`) and Custom Caps (`:bar_chart:`) in the quota summary and detail views.
- **Rendering**: Standardize emoji usage to Rich text markup (e.g., `:warning:` instead of unicode) across `LauncherTUI` and `QuotaViewer` to ensure consistent alignment and width calculations.
Standardize emoji usage by converting raw unicode characters to `rich` console markup aliases throughout the application. This ensures consistent rendering across different terminal environments and prevents potential encoding issues.

Also in this commit:
- style(ui): adjust column widths in quota viewer for better text alignment
- Remove deprecated `_client_legacy.py` and `_usage_manager_legacy.py` files.
- Refactor `RequestExecutor` to use a `_get_quota_display` helper method, reducing code duplication in logging logic.
- Remove duplicate constant definition in `defaults.py`.
- Fix duplicate export in `usage` module.
@Mirrowel Mirrowel removed the Agent Monitored Monitored for AI Agent to review PR's and commits label Jan 24, 2026
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

This update continues the refinement of the core modular architecture. The implementation of historical high-water mark tracking (max_recorded_requests) is a great addition, and the emoji updates in the TUIs improve consistency.

Assessment of New Changes:

  • Historical Tracking: The high-water mark logic in TrackingEngine is solid and well-integrated.
  • Documentation: The RequestExecutor.__init__ docstrings now correctly describe the missing parameters.
  • TUI Improvements: The shift to Rich emoji markup in launcher_tui.py and quota_viewer.py looks good.

Status of Previous Feedback:

  • Logging Level: The log level for credential releases in UsageManager is still at INFO. This has been flagged in every review turn so far and really should be shifted to DEBUG to prevent log saturation in high-traffic production environments.
  • Code Duplication: While _prepare_execution unified the initialization, there is still significant identical code shared between _execute_non_streaming and _execute_streaming regarding availability logging, credential acquisition, and request preparation.

Key Suggestions:

  • Unify Request Execution: Several large blocks in executor.py are still duplicated between the streaming and non-streaming paths. Extracting these into helpers (e.g., _log_availability, _prepare_request_kwargs) would improve maintainability.
  • Internal Cleanup: UsageManager has some internal duplications (logging branches and the _mask_accessor helper) that could be simplified.

This review was generated by an AI assistant.

async with lock:
state.active_requests = max(0, state.active_requests - 1)
remaining = state.active_requests
lib_logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested in multiple previous reviews, this log level should be changed to DEBUG. In high-concurrency environments, logging every credential release at INFO level creates significant noise.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be at info level. Not an issue.

Comment on lines 288 to 307
availability = await usage_manager.get_availability_stats(
model, quota_group
)
blocked = availability.get("blocked_by", {})
blocked_parts = []
if blocked.get("cooldowns"):
blocked_parts.append(f"cd:{blocked['cooldowns']}")
if blocked.get("fair_cycle"):
blocked_parts.append(f"fc:{blocked['fair_cycle']}")
if blocked.get("custom_caps"):
blocked_parts.append(f"cap:{blocked['custom_caps']}")
if blocked.get("window_limits"):
blocked_parts.append(f"wl:{blocked['window_limits']}")
if blocked.get("concurrent"):
blocked_parts.append(f"con:{blocked['concurrent']}")
blocked_str = f"({', '.join(blocked_parts)})" if blocked_parts else ""
lib_logger.info(
f"Acquiring key for model {model}. Tried keys: {len(retry_state.tried_credentials)}/"
f"{availability.get('available', 0)}({availability.get('total', 0)}{blocked_str})"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This availability check and logging logic is identical to the one in _execute_streaming (lines 525-544). Consider extracting this into a shared private method like _log_availability.

Comment on lines 314 to 339
availability = await usage_manager.get_availability_stats(
model, quota_group
)
async with await usage_manager.acquire_credential(
model=model,
quota_group=quota_group,
candidates=untried,
priorities=filter_result.priorities,
deadline=deadline,
) as cred_context:
cred = cred_context.credential
retry_state.record_attempt(cred)

state = getattr(usage_manager, "states", {}).get(
cred_context.stable_id
)
tier = state.tier if state else None
priority = state.priority if state else None
selection_mode = availability.get("rotation_mode")
quota_display = self._get_quota_display(
state, model, quota_group, usage_manager
)
lib_logger.info(
f"Acquired key {mask_credential(cred)} for model {model} "
f"(tier: {tier}, priority: {priority}, selection: {selection_mode}, quota: {quota_display})"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This credential acquisition block is duplicated in _execute_streaming (lines 565-590). Since it involves an async with context, you might want to extract the preparation logic or use a helper that returns the context.

Updates deployment configuration and documentation to align with the refactored usage tracking system.

- Replace `key_usage.json` volume mount with `usage/` directory in docker-compose files.
- Update system documentation to reflect new component paths and per-provider usage files.
- Document new CLI options for raw logging and TUI quota viewer.

BREAKING CHANGE: The persistence volume for usage statistics has changed from a file (`key_usage.json`) to a directory (`usage/`). Users must create this directory and update their Docker volume mounts to prevent startup errors and ensure usage data is persisted correctly.
…masking

Refactors the library to improve code reusability and maintainability:

- Extract `_log_acquiring_credential` and `_prepare_request_kwargs` methods in `RequestExecutor` to simplify the main `execute` loop.
- Update `mask_credential` in `error_handler` to support a "full" masking style (first 4 + last 4 characters).
- Remove redundant `_mask_accessor` method from `UsageManager` and replace usage with the centralized `mask_credential` utility.
@Mirrowel Mirrowel marked this pull request as ready for review January 24, 2026 02:15
Repository owner deleted a comment from ellipsis-dev bot Jan 24, 2026
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

This update makes excellent progress on consolidating the request execution logic and standardizing credential masking. The refactoring of executor.py significantly improves the maintainability of the core library.

Assessment of New Changes:

  • Shared Setup: The introduction of _log_acquiring_credential, _prepare_request_kwargs, and _prepare_execution in executor.py is a great step toward unifying the streaming and non-streaming paths.
  • Credential Masking: The migration to a centralized mask_credential utility in error_handler.py and its consistent use in UsageManager is a solid improvement.
  • Documentation: The updates to DOCUMENTATION.md and README.md correctly reflect the new usage/ directory structure and per-provider persistence.

Status of Previous Feedback:

  • Logging Level: The log level for credential releases in UsageManager is still at INFO. Please shift this to DEBUG as previously requested to prevent log saturation in production.
  • Structural Duplication: While many helpers were extracted, there are still several identical code blocks shared between the two execution paths in executor.py (logging the acquired credential and pre-request callback handling).

Overall Status:
The architecture is now very clean and modular. Addressing the remaining logging level and the few remaining duplications in the executor will bring this refactor to a production-ready state.

This review was generated by an AI assistant.

Refactors the `RequestExecutor` class to improve code maintainability and reduce duplication between the initial request and retry loops.

- Extract `_log_acquired_credential` to centralize detailed logging upon successful credential acquisition.
- Extract `_run_pre_request_callback` to encapsulate callback execution and error handling logic.
- Clean up the `execute` method by replacing verbose inline blocks with these helper methods.
@Mirrowel Mirrowel merged commit 3a2763d into dev Jan 24, 2026
2 of 3 checks passed
@Mirrowel Mirrowel deleted the refactor/core branch January 24, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant